Performance comparison between k-tuple distance and four model-based distances in phylogenetic tree reconstruction
نویسندگان
چکیده
Phylogenetic tree reconstruction requires construction of a multiple sequence alignment (MSA) from sequences. Computationally, it is difficult to achieve an optimal MSA for many sequences. Moreover, even if an optimal MSA is obtained, it may not be the true MSA that reflects the evolutionary history of the underlying sequences. Therefore, errors can be introduced during MSA construction which in turn affects the subsequent phylogenetic tree construction. In order to circumvent this issue, we extend the application of the k-tuple distance to phylogenetic tree reconstruction. The k-tuple distance between two sequences is the sum of the differences in frequency, over all possible tuples of length k, between the sequences and can be estimated without MSAs. It has been traditionally used to build a fast 'guide tree' to assist the construction of MSAs. Using the 1470 simulated sets of sequences generated under different evolutionary scenarios, the neighbor-joining trees and BioNJ trees, we compared the performance of the k-tuple distance with four commonly used distance estimators including Jukes-Cantor, Kimura, F84 and Tamura-Nei. These four distance estimators fall into the category of model-based distance estimators, as each of them takes account of a specific substitution model in order to compute the distance between a pair of already aligned sequences. Results show that trees constructed from the k-tuple distance are more accurate than those from other distances most time; when the divergence between underlying sequences is high, the tree accuracy could be twice or higher using the k-tuple distance than other estimators. Furthermore, as the k-tuple distance voids the need for constructing an MSA, it can save tremendous amount of time for phylogenetic tree reconstructions when the data include a large number of sequences.
منابع مشابه
A Novel Genetic Algorithm based Approach for Optimization of Distance Matrix for Phylogenetic Tree ConstructionA Novel Genetic Algorithm based Approach for Optimization of Distance Matrix for Phylogenetic Tree Construction
Phylogenies are useful for organizing knowledge of biological diversity, for structuring classifications, and for providing knowledge of events that occurred during evolution. Different phylogenetic reconstruction techniques are available. In this paper Distance based technique is used. Distance measure is an important issue in phylogenetic analysis. Traditional approaches are time-consuming du...
متن کاملA Novel Genetic Algorithm based Approach for Optimization of Distance Matrix for Phylogenetic Tree Construction
Phylogenies are useful for organizing knowledge of biological diversity, for structuring classifications, and for providing knowledge of events that occurred during evolution. Different phylogenetic reconstruction techniques are available. In this paper Distance based technique is used. Distance measure is an important issue in phylogenetic analysis. Traditional approaches are time-consuming du...
متن کاملLifePrint: a novel k-tuple distance method for construction of phylogenetic trees
PURPOSE Here we describe LifePrint, a sequence alignment-independent k-tuple distance method to estimate relatedness between complete genomes. METHODS We designed a representative sample of all possible DNA tuples of length 9 (9-tuples). The final sample comprises 1878 tuples (called the LifePrint set of 9-tuples; LPS9) that are distinct from each other by at least two internal and noncontigu...
متن کاملTowards Extracting All Phylogenetic Information from Matrices of Evolutionary Distances
The matrix of evolutionary distances is a model-based statistic, derived from molecular sequences, summarizing the pairwise phylogenetic relationships between a collection of species. Phylogenetic tree reconstruction methods relying on this matrix are relatively fast and thus widely used in molecular systematics. However, because of their intrinsic reliance on summary statistics, distance-matri...
متن کاملGenes order and phylogenetic reconstruction: application to γ-Proteobacteria
We study the problem of phylogenetic reconstruction based on gene order for whole genomes. We define three genomic distances between whole genomes represented by signed sequences, based on the matching of similar segments of genes and on the notions of breakpoints, conserved intervals and common intervals. We use these distances and distance based phylogenetic reconstruction methods to compute ...
متن کامل